28 research outputs found
Recommended from our members
Heterogeneous network embedding enabling accurate disease association predictions.
BackgroundIt is significant to identificate complex biological mechanisms of various diseases in biomedical research. Recently, the growing generation of tremendous amount of data in genomics, epigenomics, metagenomics, proteomics, metabolomics, nutriomics, etc., has resulted in the rise of systematic biological means of exploring complex diseases. However, the disparity between the production of the multiple data and our capability of analyzing data has been broaden gradually. Furthermore, we observe that networks can represent many of the above-mentioned data, and founded on the vector representations learned by network embedding methods, entities which are in close proximity but at present do not actually possess direct links are very likely to be related, therefore they are promising candidate subjects for biological investigation.ResultsWe incorporate six public biological databases to construct a heterogeneous biological network containing three categories of entities (i.e., genes, diseases, miRNAs) and multiple types of edges (i.e., the known relationships). To tackle the inherent heterogeneity, we develop a heterogeneous network embedding model for mapping the network into a low dimensional vector space in which the relationships between entities are preserved well. And in order to assess the effectiveness of our method, we conduct gene-disease as well as miRNA-disease associations predictions, results of which show the superiority of our novel method over several state-of-the-arts. Furthermore, many associations predicted by our method are verified in the latest real-world dataset.ConclusionsWe propose a novel heterogeneous network embedding method which can adequately take advantage of the abundant contextual information and structures of heterogeneous network. Moreover, we illustrate the performance of the proposed method on directing studies in biology, which can assist in identifying new hypotheses in biological investigation
RDGSL: Dynamic Graph Representation Learning with Structure Learning
Temporal Graph Networks (TGNs) have shown remarkable performance in learning
representation for continuous-time dynamic graphs. However, real-world dynamic
graphs typically contain diverse and intricate noise. Noise can significantly
degrade the quality of representation generation, impeding the effectiveness of
TGNs in downstream tasks. Though structure learning is widely applied to
mitigate noise in static graphs, its adaptation to dynamic graph settings poses
two significant challenges. i) Noise dynamics. Existing structure learning
methods are ill-equipped to address the temporal aspect of noise, hampering
their effectiveness in such dynamic and ever-changing noise patterns. ii) More
severe noise. Noise may be introduced along with multiple interactions between
two nodes, leading to the re-pollution of these nodes and consequently causing
more severe noise compared to static graphs. In this paper, we present RDGSL, a
representation learning method in continuous-time dynamic graphs. Meanwhile, we
propose dynamic graph structure learning, a novel supervisory signal that
empowers RDGSL with the ability to effectively combat noise in dynamic graphs.
To address the noise dynamics issue, we introduce the Dynamic Graph Filter,
where we innovatively propose a dynamic noise function that dynamically
captures both current and historical noise, enabling us to assess the temporal
aspect of noise and generate a denoised graph. We further propose the Temporal
Embedding Learner to tackle the challenge of more severe noise, which utilizes
an attention mechanism to selectively turn a blind eye to noisy edges and hence
focus on normal edges, enhancing the expressiveness for representation
generation that remains resilient to noise. Our method demonstrates robustness
towards downstream tasks, resulting in up to 5.1% absolute AUC improvement in
evolving classification versus the second-best baseline
The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
<p>Abstract</p> <p>Background</p> <p>Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.</p> <p>Results</p> <p>The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).</p> <p>Conclusion</p> <p>The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.</p
TIGER: Temporal Interaction Graph Embedding with Restarts
Temporal interaction graphs (TIGs), consisting of sequences of timestamped
interaction events, are prevalent in fields like e-commerce and social
networks. To better learn dynamic node embeddings that vary over time,
researchers have proposed a series of temporal graph neural networks for TIGs.
However, due to the entangled temporal and structural dependencies, existing
methods have to process the sequence of events chronologically and
consecutively to ensure node representations are up-to-date. This prevents
existing models from parallelization and reduces their flexibility in
industrial applications. To tackle the above challenge, in this paper, we
propose TIGER, a TIG embedding model that can restart at any timestamp. We
introduce a restarter module that generates surrogate representations acting as
the warm initialization of node representations. By restarting from multiple
timestamps simultaneously, we divide the sequence into multiple chunks and
naturally enable the parallelization of the model. Moreover, in contrast to
previous models that utilize a single memory unit, we introduce a dual memory
module to better exploit neighborhood information and alleviate the staleness
problem. Extensive experiments on four public datasets and one industrial
dataset are conducted, and the results verify both the effectiveness and the
efficiency of our work.Comment: WWW 2023. Codes: https://github.com/yzhang1918/www2023tige
The Challenges of Data Quality and Data Quality Assessment in the Big Data Era
High-quality data are the precondition for analyzing and using big data and for guaranteeing the value of the data. Currently, comprehensive analysis and research of quality standards and quality assessment methods for big data are lacking. First, this paper summarizes reviews of data quality research. Second, this paper analyzes the data characteristics of the big data environment, presents quality challenges faced by big data, and formulates a hierarchical data quality framework from the perspective of data users. This framework consists of big data quality dimensions, quality characteristics, and quality indexes. Finally, on the basis of this framework, this paper constructs a dynamic assessment process for data quality. This process has good expansibility and adaptability and can meet the needs of big data quality assessment. The research results enrich the theoretical scope of big data and lay a solid foundation for the future by establishing an assessment model and studying evaluation algorithms